229 research outputs found

    Redesigning OP2 Compiler to Use HPX Runtime Asynchronous Techniques

    Full text link
    Maximizing parallelism level in applications can be achieved by minimizing overheads due to load imbalances and waiting time due to memory latencies. Compiler optimization is one of the most effective solutions to tackle this problem. The compiler is able to detect the data dependencies in an application and is able to analyze the specific sections of code for parallelization potential. However, all of these techniques provided with a compiler are usually applied at compile time, so they rely on static analysis, which is insufficient for achieving maximum parallelism and producing desired application scalability. One solution to address this challenge is the use of runtime methods. This strategy can be implemented by delaying certain amount of code analysis to be done at runtime. In this research, we improve the parallel application performance generated by the OP2 compiler by leveraging HPX, a C++ runtime system, to provide runtime optimizations. These optimizations include asynchronous tasking, loop interleaving, dynamic chunk sizing, and data prefetching. The results of the research were evaluated using an Airfoil application which showed a 40-50% improvement in parallel performance.Comment: 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017

    Die stomatäre Reaktion von Sambucus nigra und Aegopodium podagraria in Abhängigkeit von Licht und Luftfeuchte - In-situ-Beobachtungen und Gaswechselmessungen im Freiland

    Get PDF
    Gleichzeitige mikroskopische Beobachtungen von Spaltöffnungsbewegungen und Messungen des CO2-H2O-Gaswechsels wurden an intakten Pflanzen der Arten Sambucus nigra L. und Aegopodium podagraria L. am Freilandstandort durchgeführt. Die Aufzeichnung der Reaktionen unter natürlichem Mikroklima und unter kontrollierten Licht- Luftfeuchte- und Temperaturbedingungen mündete in eine Beschreibung der Interaktion der Faktoren Luftfeuchte und Lichtintensität. Während A. podagaria bei allgemein geringer stomatärer Aktivität auf Kosten einer erhöhten Transpiration die photosynthetische Ausnutzung von sporadisch auftretenden Bestandeslichtflecken optimierte, limitierte S. nigra durch eine empfindliche Feuchtereaktion die Transpiration. Die Ergebnisse werden unter Berücksichtigung der Wuchsform und der Standortverhältnisse interpretiert

    Shared memory parallelism in Modern C++ and HPX

    Full text link
    Parallel programming remains a daunting challenge, from the struggle to express a parallel algorithm without cluttering the underlying synchronous logic, to describing which devices to employ in a calculation, to correctness. Over the years, numerous solutions have arisen, many of them requiring new programming languages, extensions to programming languages, or the addition of pragmas. Support for these various tools and extensions is available to a varying degree. In recent years, the C++ standards committee has worked to refine the language features and libraries needed to support parallel programming on a single computational node. Eventually, all major vendors and compilers will provide robust and performant implementations of these standards. Until then, the HPX library and runtime provides cutting edge implementations of the standards, as well as proposed standards and extensions. Because of these advances, it is now possible to write high performance parallel code without custom extensions to C++. We provide an overview of modern parallel programming in C++, describing the language and library features, and providing brief examples of how to use them

    Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL

    Full text link
    Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogeneity of available accelerator cards within current supercomputers, portability is a key aspect for modern HPC applications. In Octo-Tiger, we rely on Kokkos and its various execution spaces for portable compute kernels. In turn, we use HPX to coordinate kernel launches, CPU tasks, and communication. This combination allows us to have a fine interleaving between portable CPU/GPU computations and communication, enabling scalability on various supercomputers. However, for HPX and Kokkos to work together optimally, we need to be able to treat Kokkos kernels as HPX tasks. Otherwise, instead of integrating asynchronous Kokkos kernel launches into HPX's task graph, we would have to actively wait for them with fence commands, which wastes CPU time better spent otherwise. Using an integration layer called HPX-Kokkos, treating Kokkos kernels as tasks already works for some Kokkos execution spaces (like the CUDA one), but not for others (like the SYCL one). In this work, we started making Octo-Tiger and HPX itself compatible with SYCL. To do so, we introduce numerous software changes, most notably an HPX-SYCL integration. This integration allows us to treat SYCL events as HPX tasks, which in turn allows us to better integrate Kokkos by extending the support of HPX-Kokkos to also fully support Kokkos' SYCL execution space. We show two ways to implement this HPX-SYCL integration and test them using Octo-Tiger and its Kokkos kernels, on both an NVIDIA A100 and an AMD MI100. We find modest, yet noticeable, speedups by enabling this integration, even when just running simple single-node scenarios with Octo-Tiger where communication and CPU utilization are not yet an issue
    corecore